PyVBMC Example 2: Understanding the inputs and the output trace

In this notebook, we will discuss more in detail the inputs and the output trace of the PyVBMC algorithm, including a discussion on how to set parameter bounds and starting point. We also describe the items in the output trace, and display the evolution of the variational posterior in each iteration.

This notebook is Part 2 of a series of notebooks in which we present various example usages for VBMC with the PyVBMC package.

import numpy as np
import scipy.stats as scs
from pyvbmc.vbmc import VBMC

1. Model definition: log-likelihood, log-prior, and log-joint

We use the same toy target function as in Example 1, a broad Rosenbrock’s banana function in \(D = 2\).

D = 2  # still in 2-D


def log_likelihood(theta):
    """D-dimensional Rosenbrock's banana function."""
    theta = np.atleast_2d(theta)

    x, y = theta[:, :-1], theta[:, 1:]
    return -np.sum((x**2 - y) ** 2 + (x - 1) ** 2 / 100, axis=1)

To make the problem more interesting, we will assume parameters to be constrained to be positive, that is \(x_d > 0\) for \(1 \le d \le D\).

Since parameters are strictly positive, we impose an independent exponential prior on each variable (again, you could choose here whatever you want). We then define the log-joint (our target) as log-likelihood plus log-prior.

prior_tau = 3 * np.ones((1, D))  # Length scale of the exponential prior


def log_prior(x):
    """Independent exponential prior."""
    return np.sum(scs.expon.logpdf(x, scale=prior_tau))


def log_joint(x):
    """log-density of the joint distribution."""
    return log_likelihood(x) + log_prior(x)

2. Parameter setup: bounds and starting point

Choosing parameter bounds

Bound settings require some discussion:

  1. the specified (hard) bounds are not included in the domain, so in this case, since we want to be the parameters to be positive, we can set the lower bounds LB = 0, knowing that the parameter will always be greater than zero.

LB = np.zeros((1, D))  # Lower bounds
  1. Currently, PyVBMC does not support half-bounds, so we need to specify a finite upper bound (cannot be inf). We pick something very large according to our prior. However, do not go crazy here by picking something impossibly large, otherwise PyVBMC will likely fail.

UB = 10 * prior_tau  # Upper bounds
  1. Plausible bounds need to be meaningfully different from the hard bounds, so here we cannot set the plausible lower bounds PLB = 0. We follow the same strategy as the first example, by picking the top ~68% credible interval of the prior.

PLB = scs.expon.ppf(0.159, scale=prior_tau)
PUB = scs.expon.ppf(0.841, scale=prior_tau)

Choosing the starting point

Good practice is to initialize PyVBMC in a region of high posterior density.

  • With no additional information, uninformed approaches to choose the starting point \(\mathbf{x}_0\) include picking the mode of the prior, or, if you need multiple points, a random draw from the prior or a random point uniformly chosen inside the plausible box. None of these approaches is particularly recommended, though, especially when facing more difficult problems.

  • It is generally better to adopt an informed starting point \(\mathbf{x}_0\). A good choice is typically a point in the approximate vicinity of the mode of the posterior (no need to start exactly at the mode). Clearly, it is hard to know in advance the location of the mode, but a common approach is to run a few optimization steps on the target (starting from one of the uninformed options above).

For this example, we cheat and start in the proximity of the mode, exploiting our knowledge of the toy likelihood, for which we know the location of the maximum.

x0 = np.ones((1, D))  # Optimum of the Rosenbrock function

3. Initialize and run inference

Finally, we initialize and run PyVBMC.

Output trace

In each iteration, the PyVBMC output trace prints the following information:

  • Iteration: iteration number (starting from 0, which follows the initial design).

  • f-count: total number of the target function evaluations so far.

  • Mean[ELBO]: Estimated mean of the elbo.

  • Std[ELBO]: Standard deviation of the estimated elbo.

  • sKL-iter[q]: Gaussianized symmetrized Kullback-Leibler divergence between the variational posterior in the previous and current iteration. In brief, a measure of how much the variational posterior \(q\) changes between iterations.

  • K[q]: Number of mixture components of the variational posterior.

  • Convergence: A number that summarizes the stability of the current solution, by weighing various factors such as the change in the elbo and in the variational posterior across iterations, and the uncertainty in the elbo. Values (much) less than 1 correspond to a stable solution and thus suggest convergence of the algorithm.

  • Action: Major actions taken by the algorithm in the current iteration. Common actions/events are:

    • start warm-up and end warm-up: start and end of the initial warm-up stage;

    • rotoscale (and undo rotoscale): rotate and re-scale the inference space to make inference more tractable (undo the rotoscaling if there is no significant improvement);

    • trim data: remove some points to increase stability;

    • stable: solution stable for multiple iteration;

    • finalize: refine the variational approximation with additional mixture components.

Iteration plotting

In this example, we also switch on the plotting of the current variational posterior at each iteration with the user option plot. Regarding the iteration plots:

  • Each iteration plot shows the multidimensional variational posterior in the current iteration via one- and two-dimensional marginal plots for each variable and pairs of variables, as you would get from vp.plot() (see also Example 1).

  • In each plot, blue circles represent points in the training set of the GP surrogate model, and orange dots are points added to the training set in this iteration.

  • The red crosses are the centers of the variational mixture components.

  • Since PyVBMC with plotting on can occupy a lot of space on the notebook, remember that in Jupyter notebook you can toggle the scrolling on and off via the “Cell > Current Outputs > Toggle Scrolling” and “Cell > All Output > Toggle Scrolling” menus.

# initialize VBMC
user_options = {"plot": True}
vbmc = VBMC(log_joint, x0, LB, UB, PLB, PUB, user_options)
# Printing will give a summary of the vbmc object:
# bounds, user options, etc.:
print(vbmc)
VBMC:
    dimension = 2,
    x0: [[-0.2366, -0.2366]] : ndarray,
    lower bounds: [[0., 0.]] : ndarray,
    upper bounds: [[30., 30.]] : ndarray,
    plausible lower bounds: [[0.5195, 0.5195]] : ndarray,
    plausible upper bounds: [[5.5166, 5.5166]] : ndarray,
    log-density = <function log_joint at 0x7fa436317280>,
    log-prior = None,
    prior sampler = None,
    variational posterior = VariationalPosterior:
        dimension = 2,
        num. components = 2,
        means: 
            [[1., 1.],
             [1., 1.]] : ndarray,
        weights: [[0.5, 0.5]] : ndarray,
        sigma (per-component scale): [[0.001, 0.001]] : ndarray,
        lambda (per-dimension scale): 
            [[1.],
             [1.]] : ndarray,
        stats = None,
    Gaussian process = None,
    user options = User Options:
        plot: True (Plot marginals of variational posterior at each iteration)
# run VBMC
vp, elbo, elbo_sd, _, _ = vbmc.optimize()
Beginning variational optimization assuming EXACT observations of the log-joint.
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
     0         10           6.64         3.78    140072.47        2        inf     start warm-up
../_images/pyvbmc_example_2_inputs_outputs_17_1.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
     1         15           3.70         7.70        19.32        2        inf     
../_images/pyvbmc_example_2_inputs_outputs_17_3.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
     2         20          18.71        40.32         5.71        2        319     
../_images/pyvbmc_example_2_inputs_outputs_17_5.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
     3         25           3.16        17.74         7.79        2        295     
../_images/pyvbmc_example_2_inputs_outputs_17_7.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
     4         30          -2.48         0.24         8.43        2        218     
../_images/pyvbmc_example_2_inputs_outputs_17_9.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
     5         35          -1.43         4.77         1.33        2       50.7     
../_images/pyvbmc_example_2_inputs_outputs_17_11.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
     6         40          -2.45         0.06         0.36        2       12.1     
../_images/pyvbmc_example_2_inputs_outputs_17_13.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
     7         45          -2.40         0.02         0.01        2      0.416     end warm-up
../_images/pyvbmc_example_2_inputs_outputs_17_15.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
     8         50          -2.27         0.12         0.49        2       12.1     
../_images/pyvbmc_example_2_inputs_outputs_17_17.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
     9         55          -2.18         0.26         0.43        2       11.3     
../_images/pyvbmc_example_2_inputs_outputs_17_19.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    10         60          -2.20         0.11         0.07        3       2.11     
../_images/pyvbmc_example_2_inputs_outputs_17_21.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    11         65          -2.20         0.12         0.18        4       4.72     
../_images/pyvbmc_example_2_inputs_outputs_17_23.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    12         70          -2.19         0.01         0.04        4       1.13     
../_images/pyvbmc_example_2_inputs_outputs_17_25.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    13         75          -2.12         0.01         0.01        5      0.467     
../_images/pyvbmc_example_2_inputs_outputs_17_27.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    14         75          -1.75         0.42         6.09        6        146     rotoscale
../_images/pyvbmc_example_2_inputs_outputs_17_29.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    15         80          -1.95         0.51         0.03        6          3     
../_images/pyvbmc_example_2_inputs_outputs_17_31.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    16         85          -2.28         0.16         1.21        6       30.2     
../_images/pyvbmc_example_2_inputs_outputs_17_33.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    17         90          -2.04         0.15         0.24        6        6.9     
../_images/pyvbmc_example_2_inputs_outputs_17_35.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    18         95          -2.03         0.07         0.04        7        1.3     
../_images/pyvbmc_example_2_inputs_outputs_17_37.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    19        100          -2.06         0.06         0.08        8       2.05     
../_images/pyvbmc_example_2_inputs_outputs_17_39.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    20        100          -1.62         0.18         5.19        9        124     rotoscale
../_images/pyvbmc_example_2_inputs_outputs_17_41.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    21        105          -1.70         0.12         0.09       10        2.7     
../_images/pyvbmc_example_2_inputs_outputs_17_43.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    22        110          -1.65         0.47         0.13       11       4.87     
../_images/pyvbmc_example_2_inputs_outputs_17_45.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    23        115          -2.75         0.41         4.45       11        110     
../_images/pyvbmc_example_2_inputs_outputs_17_47.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    24        120          -2.11         0.01         0.57       10       15.6     
../_images/pyvbmc_example_2_inputs_outputs_17_49.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    25        125          -2.00         0.00         0.01       10       0.49     
../_images/pyvbmc_example_2_inputs_outputs_17_51.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    26        130          -1.99         0.00         0.00       11     0.0847     
../_images/pyvbmc_example_2_inputs_outputs_17_53.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    27        135          -1.98         0.00         0.00       10     0.0715     
../_images/pyvbmc_example_2_inputs_outputs_17_55.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    28        140          -1.97         0.00         0.00       10     0.0469     
../_images/pyvbmc_example_2_inputs_outputs_17_57.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    29        145          -1.97         0.00         0.00       11     0.0491     
../_images/pyvbmc_example_2_inputs_outputs_17_59.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    30        150          -1.97         0.00         0.00       10     0.0373     
../_images/pyvbmc_example_2_inputs_outputs_17_61.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    31        155          -1.97         0.00         0.00       10     0.0419     rotoscale, undo rotoscale
../_images/pyvbmc_example_2_inputs_outputs_17_63.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    32        160          -1.96         0.00         0.00       10     0.0414     
../_images/pyvbmc_example_2_inputs_outputs_17_65.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    33        165          -1.96         0.00         0.00       11     0.0125     
../_images/pyvbmc_example_2_inputs_outputs_17_67.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
    34        170          -1.94         0.00         0.00       14      0.118     stable
../_images/pyvbmc_example_2_inputs_outputs_17_69.png
 Iteration  f-count    Mean[ELBO]    Std[ELBO]    sKL-iter[q]   K[q]  Convergence  Action
   inf        170          -1.89         0.00         0.03       50      0.118     finalize
../_images/pyvbmc_example_2_inputs_outputs_17_71.png
Inference terminated: variational solution stable for options.tol_stable_count fcn evaluations.
Estimated ELBO: -1.894 +/-0.002.
lml_true = -1.836  # ground truth, which we know for this toy scenario

print("The true log model evidence is:", lml_true)
print("The obtained ELBO is:", format(elbo, ".3f"))
print("The obtained ELBO_SD is:", format(elbo_sd, ".3f"))
The true log model evidence is: -1.836
The obtained ELBO is: -1.894
The obtained ELBO_SD is: 0.002

We can also print the vp object to see a summary of the variational solution:

print(vp)
VariationalPosterior:
    dimension = 2,
    num. components = 50,
    means: (2, 50) ndarray,
    weights: (1, 50) ndarray,
    sigma (per-component scale): (1, 50) ndarray,
    lambda (per-dimension scale): 
        [[0.7199],
         [1.2173]] : ndarray,
    stats = {
        'elbo': -1.8941216459037746,
        'elbo_sd': 0.001668854389419671,
        'e_log_joint': -4.139845774945299,
        'e_log_joint_sd': 0.001668854389419671,
        'entropy': 2.2457241290415246,
        'entropy_sd': 0.0,
        'stable': True : ndarray,
        'I_sk': (6, 50) ndarray,
        'J_sjk': (6, 50, 50) ndarray,
    }

4. Examine and visualize results

We will examine now the obtained variational posterior with an interactive plot.

Note: The interactive plot will not be visible in a notebook preview on GitHub. See below for a static image.

# Generate samples from the variational posterior
n_samples = int(3e5)
Xs, _ = vp.sample(n_samples)

# We compute the pdf of the approximate posterior on a 2-D grid
plot_lb = np.zeros(2)
plot_ub = np.quantile(Xs, 0.999, axis=0)
x1 = np.linspace(plot_lb[0], plot_ub[0], 400)
x2 = np.linspace(plot_lb[1], plot_ub[1], 400)

xa, xb = np.meshgrid(x1, x2)  # Build the grid
xx = np.vstack(
    (xa.ravel(), xb.ravel())
).T  # Convert grids to a vertical array of 2-D points
yy = vp.pdf(xx)  # Compute PDF values on specified points
# Plot approximate posterior pdf (this interactive plot does not work in higher D)
import plotly.graph_objects as go

fig = go.Figure(
    data=[
        go.Surface(z=yy.reshape(x1.size, x2.size), x=x1, y=x2, showscale=False)
    ]
)
fig.update_layout(
    autosize=False,
    width=800,
    height=500,
    scene=dict(
        xaxis_title="x_0",
        yaxis_title="x_1",
        zaxis_title="Approximate posterior pdf",
    ),
)

# Compute and plot approximate posterior mean
post_mean = np.mean(Xs, axis=0)
fig.add_trace(
    go.Scatter3d(
        x=[post_mean[0]],
        y=[post_mean[1]],
        z=vp.pdf(post_mean),
        name="Approximate posterior mean",
        marker_symbol="diamond",
    )
)

# Find and plot approximate posterior mode
# post_mode = vp.mode()
# fig.add_trace(go.Scatter3d(x=[post_mode[0]], y=[post_mode[1]], z=vp.pdf(post_mode), name="Approximate posterior mode", marker_symbol="x"))
# Display posterior also as a non-interactive static image
from IPython.display import Image

Image(fig.to_image(format="png"))
../_images/pyvbmc_example_2_inputs_outputs_24_0.png

5. Conclusions

In this notebook, we have discussed more in detail the input parameter settings (bounds and starting point) and the output trace of PyVBMC.

An important step still not considered in this example is a thorough validation of the results and diagnostics, which will be discussed in a later notebook.

Example 2: full code

The following cell includes in a single place all the code used in Example 2, without the extra fluff.

assert False  # skip this cell

import numpy as np
import scipy.stats as scs
from pyvbmc.vbmc import VBMC


D = 2  # still in 2-D


def log_likelihood(theta):
    """D-dimensional Rosenbrock's banana function."""
    theta = np.atleast_2d(theta)

    x, y = theta[:, :-1], theta[:, 1:]
    return -np.sum((x**2 - y) ** 2 + (x - 1) ** 2 / 100, axis=1)


prior_tau = 3 * np.ones((1, D))  # Length scale of the exponential prior


def log_prior(x):
    """Independent exponential prior."""
    return np.sum(scs.expon.logpdf(x, scale=prior_tau))


def log_joint(x):
    """log-density of the joint distribution."""
    return log_likelihood(x) + log_prior(x)


LB = np.zeros((1, D))  # Lower bounds


UB = 10 * prior_tau  # Upper bounds


PLB = scs.expon.ppf(0.159, scale=prior_tau)
PUB = scs.expon.ppf(0.841, scale=prior_tau)


x0 = np.ones((1, D))  # Optimum of the Rosenbrock function


# initialize VBMC
user_options = {"plot": True}
vbmc = VBMC(log_joint, x0, LB, UB, PLB, PUB, user_options)
# Printing will give a summary of the vbmc object:
# bounds, user options, etc.:
print(vbmc)


# run VBMC
vp, elbo, elbo_sd, _, _ = vbmc.optimize()


lml_true = -1.836  # ground truth, which we know for this toy scenario

print("The true log model evidence is:", lml_true)
print("The obtained ELBO is:", format(elbo, ".3f"))
print("The obtained ELBO_SD is:", format(elbo_sd, ".3f"))


print(vp)


# Generate samples from the variational posterior
n_samples = int(3e5)
Xs, _ = vp.sample(n_samples)

# We compute the pdf of the approximate posterior on a 2-D grid
plot_lb = np.zeros(2)
plot_ub = np.quantile(Xs, 0.999, axis=0)
x1 = np.linspace(plot_lb[0], plot_ub[0], 400)
x2 = np.linspace(plot_lb[1], plot_ub[1], 400)

xa, xb = np.meshgrid(x1, x2)  # Build the grid
xx = np.vstack(
    (xa.ravel(), xb.ravel())
).T  # Convert grids to a vertical array of 2-D points
yy = vp.pdf(xx)  # Compute PDF values on specified points


# Plot approximate posterior pdf (this interactive plot does not work in higher D)
import plotly.graph_objects as go

fig = go.Figure(
    data=[
        go.Surface(z=yy.reshape(x1.size, x2.size), x=x1, y=x2, showscale=False)
    ]
)
fig.update_layout(
    autosize=False,
    width=800,
    height=500,
    scene=dict(
        xaxis_title="x_0",
        yaxis_title="x_1",
        zaxis_title="Approximate posterior pdf",
    ),
)

# Compute and plot approximate posterior mean
post_mean = np.mean(Xs, axis=0)
fig.add_trace(
    go.Scatter3d(
        x=[post_mean[0]],
        y=[post_mean[1]],
        z=vp.pdf(post_mean),
        name="Approximate posterior mean",
        marker_symbol="diamond",
    )
)

# Find and plot approximate posterior mode
# post_mode = vp.mode()
# fig.add_trace(go.Scatter3d(x=[post_mode[0]], y=[post_mode[1]], z=vp.pdf(post_mode), name="Approximate posterior mode", marker_symbol="x"))


# Display posterior also as a non-interactive static image
from IPython.display import Image

Image(fig.to_image(format="png"))
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In [16], line 1
----> 1 assert False  # skip this cell
      3 import numpy as np
      4 import scipy.stats as scs

AssertionError: 

Acknowledgments

Work on the PyVBMC package was funded by the Finnish Center for Artificial Intelligence FCAI.